fix(pipeline): fail-loudly hygiene + operator override hatches#16
Open
GrigoryEvko wants to merge 2 commits into
Open
fix(pipeline): fail-loudly hygiene + operator override hatches#16GrigoryEvko wants to merge 2 commits into
GrigoryEvko wants to merge 2 commits into
Conversation
…s / honor force= override
Three small fixes to the pipeline-builder seam, theme: configuration
that silently does the wrong thing is harder to debug than configuration
that fails at startup.
`StageRegistry.register` silently overwrote `cls._stages[class_name]` on
duplicate registration. Two modules decorating distinct `Stage` subclasses
with the same class name would shadow each other invisibly; downstream
`get_stage(name).import_path` lookups would resolve to whichever decorator
ran last (load-order dependent). Now raises `ValueError` listing both
import paths; tests that intentionally re-register can call `clear()`
between (existing pattern).
`DefaultPipelineBuilder._contribute_default_deps` reassigned the whole
`self._deps` dict via `self._deps = {...}` literal. A subclass that
overrides `_contribute_default_deps`, adds its own deps, and calls
`super()._contribute_default_deps()` afterwards would silently lose
everything it added. Switched to per-key
`self._deps.setdefault(stage, []).extend(deps)` — subclass-added deps
for unrelated stages survive, and deps for stages the parent also
contributes are concatenated rather than overwritten.
`select_pipeline_builder` had no way for an operator to bypass the
`problem_context.is_contextual` heuristic. When the auto-detection
guesses wrong (or the operator wants to force a specific builder for a
run), the only escape was switching the whole pipeline YAML. Added
`force: str | None = None` kwarg accepting `"context"` / `"default"` —
overrideable from Hydra via `+pipeline_builder.force=default`. Invalid
values raise so typos surface immediately rather than falling back to
heuristic.
5 files changed (3 source + 2 test), +98 / -2. All 4493 tests pass.
Hunter-resolver pass on PC2/PC3/PC4 surfaced three more instances of the same anti-patterns in the same files this PR touches. `PipelineBuilder.add_stage` had the exact `StageRegistry.register` silent-overwrite pattern: `self._nodes[name] = factory` keyed by a caller-supplied string with no collision guard. The class already exposes `replace_stage` as the explicit overwrite escape hatch, so making `add_stage` raise on duplicate is consistent with intent. A subclass copy-pasting a parent `add_stage` call now fails at construction instead of producing a working-but-wrong pipeline (the second factory shadows the first, but downstream wiring still references the first by name). `CMAOptPipelineBuilder` and `OptunaOptPipelineBuilder` both branched on `ctx.problem_ctx.is_contextual` to decide whether to add context wiring, with no operator override — same PC4 shape as `select_pipeline_builder`. Added `force: str | None = None` kwarg (`"context"` / `"default"` / None) to each constructor, overrideable via Hydra `+pipeline_builder.force=default`. Factored the validation into a `_resolve_context_flag` helper that `select_pipeline_builder` now delegates to for consistency. All 6 production builder constructions verified to construct cleanly under the new `add_stage` raise (no latent duplicate add_stage in DefaultPipelineBuilder, ContextPipelineBuilder, CMAOptPipelineBuilder, OptunaOptPipelineBuilder for both contextual and non-contextual problem contexts). Regression tests added (97 unit tests pass; full repo 4500 still green): - `tests/entrypoint/test_default_pipelines.py::TestPipelineBuilder::test_add_stage_duplicate_raises` - `TestCMAOptPipelineBuilder::test_force_default_skips_context_when_inferred_contextual` - `TestCMAOptPipelineBuilder::test_force_context_adds_context_when_inferred_non_contextual` - `TestCMAOptPipelineBuilder::test_force_invalid_raises` - `TestOptunaOptPipelineBuilder::test_force_default_skips_context_when_inferred_contextual` - `TestOptunaOptPipelineBuilder::test_force_context_adds_context_when_inferred_non_contextual` - `TestOptunaOptPipelineBuilder::test_force_invalid_raises`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Four small hygiene fixes around pipeline construction. Theme: configuration surfaces that silently accept ambiguous input are harder to debug than ones that fail loudly.
StageRegistry.registersilently overwrotecls._stages[class_name]on duplicate registration. Two modules decorating distinctStagesubclasses with the same class name would shadow each other invisibly; downstreamget_stage(name).import_pathlookups would resolve to whichever decorator imported last (load-order dependent). Now raisesValueErrorlisting both import paths. Tests that intentionally re-register can callStageRegistry.clear()between registrations — the existing escape hatch.PipelineBuilder.add_stagehad the same silent-overwrite pattern:self._nodes[name] = factorykeyed by a caller-supplied string with no collision guard. The class already exposes areplace_stagemethod as the explicit overwrite path, so makingadd_stageraise on duplicate is consistent with intent. A subclass that copy-pastes a parent'sadd_stagecall now fails at construction instead of producing a pipeline where downstream wiring references the original factory by name while the registry holds the shadow.DefaultPipelineBuilder._contribute_default_depsreassigned the wholeself._depsdict viaself._deps = {...}literal. A subclass that overrides_contribute_default_deps, adds its own deps toself._deps, and then callssuper()._contribute_default_deps()would silently lose everything it added. Switched to per-keyself._deps.setdefault(stage, []).extend(deps)— subclass-added deps for unrelated stages survive, and deps for stages the parent also contributes are concatenated rather than overwritten.select_pipeline_builder,CMAOptPipelineBuilder.__init__, andOptunaOptPipelineBuilder.__init__all branched onproblem_context.is_contextualto decide whether to add context wiring, with no operator override path. When the filesystem-inferred heuristic guesses wrong for a specific run, the only escape was switching to a different pipeline YAML. Added aforce: str | None = Nonekwarg accepting"context"/"default"to all three call sites — overrideable from Hydra via+pipeline_builder.force=default. Invalid values raise so typos surface immediately rather than falling through to the heuristic. Factored validation into a shared_resolve_context_flaghelper so the three call sites stay in sync.